National Repository of Grey Literature 6 records found  Search took 0.01 seconds. 
Automatic Webpage Reconstruction
Serečun, Viliam ; Ryšavý, Ondřej (referee) ; Veselý, Vladimír (advisor)
Many legal institutions require a burden of proof regarding web content. This thesis deals with a problem connected to web reconstruction and archiving. The primary goal is to provide an open source solution, which will satisfy legal institutions with their requirements. This work presents two main products. The first is a framework, which is a fundamental building block for developing web scraping and web archiving applications. The second product is a web application prototype. This prototype shows the framework utilization. The application output is MAFF archive file which comprises a reconstructed web page, web page screenshot, and meta information table. This table shows information about collected data, server information such as IP addresses and ports of a device where is the original web page located, and time stamp.
Web Page Archiving Tools
Kvačkaj, Matúš ; Rychlý, Marek (referee) ; Burget, Radek (advisor)
This bachelor thesis deals with the issue of archiving and reproduction of web pages. The aim was to provide a tool that, after specifying the URL and parameters, creates an archive in WARC format of a given page and also generates its textual description, suitable for further processing and analysis. The tool also supports the reverse process - replaying a site from a WARC archive and generating a textual description of the page. When implementing the tool, it was intended that it would be applied to an existing dataset and would be part of a bulk data processing. The Webis-Web-Archive-17 dataset was used, which contains approximately 10,000 WARC archives collected since 2017. To ensure maximum portability of the tool, Docker containerization was used.
Long-term Preservation of Web Content
Kvasnica, Jaroslav ; Pokorný, Jan (advisor) ; Souček, Martin (referee)
This work describes the long term preservation of digital documents, particularly websites. The aim of this work is to give an explanation of the long term preservation, to define the differences between various approaches and to describe long term preservation of web content possibilities such as migration or emulation. It also explains risks and challenges of these strategies. It discusses new problems which the long term preservation aim leads to. It also describes possible solutions as well as it describes the situation in selected significant foreign institutions. The main aim of this work is detailed analysis of long term preservation strategy in theNational Library of the Czech Republic, which is the only institution engaged in the preservation of Czech web. The process of data preparation, metadata creation and data storing in the long term repository of the Czech National Library is thoroughly described, including examples and their explanation. Future actions of long term preservation in the Czech Web Archive are articulated in the conclusion. Powered by TCPDF (www.tcpdf.org)
Automatic Webpage Reconstruction
Serečun, Viliam ; Ryšavý, Ondřej (referee) ; Veselý, Vladimír (advisor)
Many legal institutions require a burden of proof regarding web content. This thesis deals with a problem connected to web reconstruction and archiving. The primary goal is to provide an open source solution, which will satisfy legal institutions with their requirements. This work presents two main products. The first is a framework, which is a fundamental building block for developing web scraping and web archiving applications. The second product is a web application prototype. This prototype shows the framework utilization. The application output is MAFF archive file which comprises a reconstructed web page, web page screenshot, and meta information table. This table shows information about collected data, server information such as IP addresses and ports of a device where is the original web page located, and time stamp.
Long-term Preservation of Web Content
Kvasnica, Jaroslav ; Pokorný, Jan (advisor) ; Souček, Martin (referee)
This work describes the long term preservation of digital documents, particularly websites. The aim of this work is to give an explanation of the long term preservation, to define the differences between various approaches and to describe long term preservation of web content possibilities such as migration or emulation. It also explains risks and challenges of these strategies. It discusses new problems which the long term preservation aim leads to. It also describes possible solutions as well as it describes the situation in selected significant foreign institutions. The main aim of this work is detailed analysis of long term preservation strategy in theNational Library of the Czech Republic, which is the only institution engaged in the preservation of Czech web. The process of data preparation, metadata creation and data storing in the long term repository of the Czech National Library is thoroughly described, including examples and their explanation. Future actions of long term preservation in the Czech Web Archive are articulated in the conclusion. Powered by TCPDF (www.tcpdf.org)
Comparative Analysis of WebArchiv of the National Library of the Czech Republic and Foreign Projects
Kupcová, Pavla ; Římanová, Radka (advisor) ; Bratková, Eva (referee)
(in English) The topic of the diploma thesis is to compare the WebArchiv with selected foreign Web Archives, which are responsible for preserving the national cultural heritage. The introduction briefly explains the history of Web Archives and typology of harvesting. Next parts deal with the history, legal aspects of archiving, selected types of harvesting, Web resources, systems, accessing and evaluation the Czech (WebArchiv), Australian (Pandora) and British archive (United Kingdom Web Archive). The text continues with an evaluation of the selected archives that mentions strong and weak properties and possible solutions. In conclusion, outlines the problematic aspects of archiving, which must be addressed in the future. [Author's abstract]

Interested in being notified about new results for this query?
Subscribe to the RSS feed.